AITopics

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceNov-25-2025

Escaping Optimization Stagnation: Taking Steps Beyond Task Arithmetic via Difference Vectors

Wang, Jinping, Gao, Zhiqiang, Zhang, Dinggen, Xie, Zhiwu

Current methods for editing pre-trained models face significant challenges, primarily high computational costs and limited scalability. Task arithmetic has recently emerged as a promising solution, using simple arithmetic operations--addition and negation--based on task vectors which are the differences between fine-tuned and pre-trained model weights, to efficiently modify model behavior. However, the full potential of task arithmetic remains underexplored, primarily due to limited mechanisms for overcoming optimization stagnation. To address this challenge, we introduce the notion of difference vector, a generalized form of task vectors derived from the historical movements during optimization. Using difference vectors as directed perturbations, we propose the Difference V ector-based Anisotropic Scaling Iterative algorithm (DV -BASI) to enable a continuous optimization process for task arithmetic methods without relying on any additional modules or components. Notably, by leveraging escapability and directional advantages of difference vectors, the average performance on different tasks of the multi-task model merged by DV -BASI may even outperform models individually fine-tuned. Based on this observation, we extend the application of difference vectors to a feasible fine-tuning method for single-task models. On the practical side, DV -BASI allows expressive searching directions with few learnable parameters and forms a scalable framework. We also integrate DV -BASI with task arithmetic methods and advanced optimization techniques to achieve state-of-the-art performance on both supervised and unsupervised evaluation protocols.

artificial intelligence, machine learning, natural language, (19 more...)

2511.17987

Country:

Europe (0.93)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Telesco, Lucas Gabriel, Nejamkin, Danila, Mata, Estefanía, Filizzola, Francisco, Wignall, Kevin, Troilo, Lucía Franco, Cenoz, María de los Angeles, Thompson, Melissa, Leguía, Mercedes, Larrabide, Ignacio, Orlando, José Ignacio

Semi-Supervised Multi-Task Learning for Interpretable Quality As- sessment of Fundus Images

arXiv.org Artificial IntelligenceNov-18-2025

Retinal image quality assessment (RIQA) supports computer-aided diagnosis of eye diseases. However, most tools classify only overall image quality, without indicating acquisition defects to guide recapture. This gap is mainly due to the high cost of detailed annotations. In this paper, we aim to mitigate this limitation by introducing a hybrid semi-supervised learning approach that combines manual labels for overall quality with pseudo-labels of quality details within a multi-task framework. Our objective is to obtain more interpretable RIQA models without requiring extensive manual labeling. Pseudo-labels are generated by a Teacher model trained on a small dataset and then used to fine-tune a pre-trained model in a multi-task setting. Using a ResNet-18 backbone, we show that these weak annotations improve quality assessment over single-task baselines (F1: 0.875 vs. 0.863 on EyeQ, and 0.778 vs. 0.763 on DeepDRiD), matching or surpassing existing methods. The multi-task model achieved performance statistically comparable to the Teacher for most detail prediction tasks (p > 0.05). In a newly annotated EyeQ subset released with this paper, our model performed similarly to experts, suggesting that pseudo-label noise aligns with expert variability. Our main finding is that the proposed semi-supervised approach not only improves overall quality assessment but also provides interpretable feedback on capture conditions (illumination, clarity, contrast). This enhances interpretability at no extra manual labeling cost and offers clinically actionable outputs to guide image recapture.

artificial intelligence, machine learning, quality assessment, (17 more...)

doi: 10.1016/j.bspc.2025.109167

2511.13353

Country: South America (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 16:41:26 GMT

AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning

The effective adoption of MTL faces two main challenges.

artificial intelligence, machine learning, multi-task model, (15 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-3-2025, 01:27:39 GMT

To Reviewer 1: 1 C1: The main weakness of this work is the complexity of the approach

R1: We do agree that our approach is complex and involves "multiple approximation steps". C2: On Park1, the NN "finds the global optimum after one query point"-- How is this significant... " First, the quality of the query point very much depends on the accuracy of the surrogate model. C3: Details about hyper-parameter selection; no liberty to choose a heldout dataset in practice. We optimized the hyper-parameters to minimize the average test error. We will supplement these details.

acquisition function, artificial intelligence, machine learning, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)

Fariba Yousefi, Michael T. Smith, Mauricio Álvarez

Multi-task Learning for Aggregated Data using Gaussian Processes

Neural Information Processing SystemsOct-2-2025, 21:07:42 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, inductive learning, machine learning, (17 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Ruano, João, Correia, Gonçalo M., Barreiros, Leonor, Mendes, Afonso

Effective Multi-Task Learning for Biomedical Named Entity Recognition

arXiv.org Artificial IntelligenceJul-25-2025

Biomedical Named Entity Recognition presents significant challenges due to the complexity of biomedical terminology and inconsistencies in annotation across datasets. This paper introduces SRU-NER (Slot-based Recurrent Unit NER), a novel approach designed to handle nested named entities while integrating multiple datasets through an effective multi-task learning strategy. SRU-NER mitigates annotation gaps by dynamically adjusting loss computation to avoid penalizing predictions of entity types absent in a given dataset. Through extensive experiments, including a cross-corpus evaluation and human assessment of the model's predictions, SRU-NER achieves competitive performance in biomedical and general-domain NER tasks, while improving cross-domain generalization.

artificial intelligence, machine learning, natural language, (19 more...)

2507.18542

Country:

Europe (1.00)
Asia > China (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Lee, Yeoreum, Jung, Jinwook, Baik, Sungyong

Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning

arXiv.org Artificial IntelligenceApr-22-2025

A BSTRACT Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model. Recently, several research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters. Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks. Few recent works have focused on designing a new fine-tuning scheme that can lead to small parameter interference, however at the cost of the performance of each task-specific fine-tuned model and thereby limiting that of a merged model. To improve the performance of a merged model, we note that a fine-tuning scheme should aim for (1) smaller parameter interference and (2) better performance of each fine-tuned model on the corresponding task. In this work, we aim to design a new fine-tuning objective function to work towards these two goals. In the course of this process, we find such objective function to be strikingly similar to sharpness-aware minimization (SAM) objective function, which aims to achieve generalization by finding flat minima. Drawing upon our observation, we propose to fine-tune pre-trained models via sharpness-aware minimization. The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Recent successes of the pretraining-finetuning paradigm have given rise to a burst of task-specific open-source models in communities, such as Hugging Face. Diversity yet ready availability of large task-specific models have naturally elicited a question from researchers: Can we combine these large models into one, while retaining the performance on each task? Traditionally, a single multi-task model is obtained by jointly training on data across all tasks (Caru-ana, 1997; Crawshaw, 2020; V andenhende et al., 2022). However, given the size of foundation models and the number of tasks, joint training on all tasks incurs significant computational costs. However, a central challenge remains: parameters of different task-specific models interfere or conflict with each other, leading to the performance degradation of a merged multi-task model on each task.

artificial intelligence, machine learning, natural language, (18 more...)

2504.14662

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-15-2025

Efficient Multi-Task Modeling through Automated Fusion of Trained Models

Zhou, Jingxuan, Bao, Weidong, Wang, Ji, Zhong, Zhengyi, Zhang, Dayu

Although multi-task learning is widely applied in intelligent services, traditional multi-task modeling methods often require customized designs based on specific task combinations, resulting in a cumbersome modeling process. Inspired by the rapid development and excellent performance of single-task models, this paper proposes an efficient multi-task modeling method that can automatically fuse trained single-task models with different structures and tasks to form a multi-task model. As a general framework, this method allows modelers to simply prepare trained models for the required tasks, simplifying the modeling process while fully utilizing the knowledge contained in the trained models. This eliminates the need for excessive focus on task relationships and model structure design. To achieve this goal, we consider the structural differences among various trained models and employ model decomposition techniques to hierarchically decompose them into multiple operable model components. Furthermore, we have designed an Adaptive Knowledge Fusion (AKF) module based on Transformer, which adaptively integrates intra-task and inter-task knowledge based on model components. Through the proposed method, we achieve efficient and automated construction of multi-task models, and its effectiveness is verified through extensive experiments on three datasets.

knowledge management, machine learning, natural language, (18 more...)

2504.09812

Country:

Europe (0.93)
North America > United States (0.69)

Genre: Research Report (1.00)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.90)

arXiv.org Artificial IntelligenceJan-30-2025

Citation Recommendation based on Argumentative Zoning of User Queries

Ma, Shutian, Zhang, Chengzhi, Zhang, Heng, Gao, Zheng

Due to the increasing of scientific publication, scientific information recommendation has become an urgent problem which can save retrieval cost. There are kinds of information that can be recommended, such as paper recommendation (Mei et al., 2022), author recommendation (Alhoori & Furuta, 2017), journal recommendation (Gündoğan et al., 2023) and so on. Among them, citation recommendation has arisen researchers' attention, which aims to help people find appropriate and necessary work to cite based on the given user queries. This paper aims to improve citation recommendation by considering the argumentative zoning of the citing sentence. Normally, authors will follow a logical framework when writing scientific papers. For example, the International Committee of Medical Journal Editors (ICMJE) recommends the IMRaD (Introduction, Methods, Results and Discussion) structure in writing and editing guidelines of biomedical publications (Editors & others, 2004). The structure of a research article is designed to present the research work clearly and concisely. This structure also helps to make it easy for readers to understand and evaluate the research.

data mining, machine learning, natural language, (18 more...)

doi: 10.1016/j.joi.2024.101607

2501.18292

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Indiana (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)